XML-Enabled Association Analysis
نویسنده
چکیده
The discovery of association rules from large amounts of structured or semi-structured data is an important data mining problem [Agrawal et al. 1993, Agrawal and Srikant 1994, Miyahara et al. 2001, Termier et al. 2002, Braga et al. 2002, Cong et al. 2002, Braga et al. 2003, Xiao et al. 2003, Maruyama and Uehara 2000, Wang and Liu 2000]. It has crucial applications in decision support and marketing strategy. The most prototypical application of association rules is market basket analysis using transaction databases from supermarkets. These databases contain sales transaction records, each of which details items bought by a customer in the transaction. Mining association rules is the process of discovering knowledge such as “80% of customers who bought diapers also bought beer, and 35% of customers bought both diapers and beer”, which can be expressed as “diaper ⇒ beer” (35%, 80%), where 80% is the confidence level of the rule, and 35% is the support level of the rule indicating how frequently the customers bought both diapers and beer. In general, an association rule takes the form X ⇒ Y (s, c), where X and Y are sets of items, and s and c are support and confidence, respectively. In the XML Era, mining association rules is confronted with more challenges than in the traditional well-structured world due to the inherent flexibilities of XML in both structure and semantics [Feng and Dillon 2005]. First, XML data has a more complex hierarchical structure than a database record. Second, elements in XML data have contextual positions, which thus carry the order notion. Third, XML data appears to be much bigger than traditional data. To address these challenges, the classic association rule mining framework originating with transactional databases needs to be re-examined. BACKGROUND
منابع مشابه
Mining XML-Enabled Association Rules with Templates
XML-enabled association rule framework [8] extends the notion of associated items to XML fragments to present associations among trees rather than simple-structured items of atomic values. They are more flexible and powerful in representing both simple and complex structured association relationships inherent in XML data. Compared with traditional association mining in the well-structured world...
متن کاملAn XML-Enabled Association Rule Framework
With the sheer amount of data stored, presented and exchanged using XML nowadays, the ability to extract knowledge from XML data sources becomes increasingly important and desirable. This paper aims to integrate the newly emerging XML technology with data mining technology, using association rule mining as a case in point. Compared with traditional association mining in the well-structured worl...
متن کاملSQL/XML Hierarchical Query Performance Analysis in an XML-Enabled Database System
The increase utilization of XML structure for data representation, exchange, and integration has strengthened the need for an efficient storage and retrieval of XML data. Currently, there are two major streams of XML data repositories. The first stream is the Native XML database systems which are built solely to store and manipulate XML data, and equipped with the standard XML query language kn...
متن کاملA Comparative Analysis of XML Documents, XML Enabled Databases and Native XML Databases
With the increasing popularity of XML data and a great need for a database management system able to store, retrieve and manipulate XML-based data in an efficient manner, database research communities and software industries have tried to respond to this requirement. XML-enabled database and native XML database are two approaches that have been proposed to address this challenge. These two appr...
متن کاملTowards Achieving an Optimum Performanceof XML Data into Both Types of XML Databases: XML-Enabled Databases and Native XML Databases
EXtensible Markup Language (XML) promises to be the standard language for data representation in e-business, particularly when that data is exchanged over or browsed on the Internet since it is nested and having a self-describing structure that provides a simple yet flexible means for business applications to model and exchange data. There are two alternative database types used for of storing ...
متن کامل